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LISTING OF THE CLAIMS; 

1 . (Currently amended): A method for the segmentation of an audio stream into 
semantic or syntactic units wherein the audio stream is provided in a digitized format, 
comprising the steps of; 

determining a fundamental frequency for the digitized audio stream; 

detecting changes of the fundamental frequency in the audio stream, wherein 
detecting the changes of the fundamental frequency includes providing a threshold value 
for estimates of the fundamental frequency's voicedness and determining whether the 
voicedness of the fundamental frequency estimates are higher or lower than the threshold 
value, and wherein the voicedness of the fundamental frequency estimates lower than the 
threshold value equals no voice, and wherein the voicedness of the fundamental 
frequency estimates higher than the threshold value equals voice; 

determining candidate boundaries for the semantic or syntactic units depending on 
the detected changes of the fundamental frequency; 

extracting a plurality of prosodic features in an environment of the audio stream 
where the voicedness of the fundamental frequency estimates are lower than the threshold 
value, wherein the environment is a period of time between 500 and 4000 milliseconds 
preceding and following the candidate boundaries: 

e xtracting and combining [[a]] the plurality of prosodic features ia-a 
noighborhoodof the candidat e- boundari es; and 

determining boundaries for the semantic or syntactic units depending only on the 
combined plurality of prosodic features, 

2. (Canceled) 

3. (Previously Presented): The method according to claim 1 , wherein defining an 
index function for the fundamental frequency having a value = 0 if the voicedness of the 
fundamental frequency is lower than the threshold value and having a value - 1 if the 
voicedness of the fundamental frequency is higher than the threshold value. 
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4. (Previously Presented): The method according to claim 3, wherein extracting the 
plurality of prosodic features is in an environment of the audio stream where the value of 
the index function is = 0. 

5. . (Canceled) 

6. (Previously Presented); The method according to claim 1, wherein at least one 
prosodic feature is represented by the fundamental frequency. 

7. (Canceled) 

8- (Original): The method according to claim 1, further comprising first detecting 
speech and non-speech segments in the digitized audio stream and performing the steps 
of claim 1 thereafter only for detected speech segments. 

9. (Original): The method according to claim 8, wherein the detecting of speech and 
non-speech segments comprises utilizing the signal energy or signal energy changes, 
respectively, in the audio stream. 

10. (Original): The method according to claim 1, further comprising the step of 
performing a prosodic feature classification based on a predetermined classification tree. 

11. (Currently amended): An article of manufacture comprising a computer usable 
medium having computer readable program code means embodied therein for causing 
segmentation of an audio stream into semantic or syntactic units, wherein the audio 
stream is provided in a digitized format, the computer readable program code means in 
the article of manufacture comprising computer readable program code means for 
causing a computer to effect: 

determining a fundamental frequency for the digitized audio stream; 
detecting changes of the fundamental frequency in the audio stream, wherein 
detecting the changes of the fundamental frequency includes providing a threshold value 
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for estimates of the fundamental frequency' s voicedness and determining whether the 
voicedness of the fundamental frequency estimates are higher or lower than the threshold 
value, and wherein the voicedness of the fundamental frequency estimates lower than the 
threshold value equals no voice, and wherein the voicedness of the fundamental 
frequency estimates higher than the threshold value equals voice; 

determining candidate boundaries for the semantic or syntactic units depending on 
the detected changes of the fundamental frequency; 

extracting a plurality of prosodic features in an environment of the audio stream 
where the voicedness of the fundamental frequency estimates are lower than the threshold 
value, wherein the environment is a period nf ffm^ between 500 and 4000 milliseconds 
preceding and following the candidate boundaries 

extraoting and combining [[a]] the plurality of prosodic features m-a 
n e ighborhood of tho candidate boundaries ; and 

determining boundaries for the semantic or syntactic units depending only on the 
combined plurality of prosodic features, 

12. (Currently amended): A digital audio processing system for segmentation of a 
digitized audio stream into semantic or syntactic units comprising: 

means for determining a fundamental frequency for the digitized audio stream;' 

means for detecting changes of the fundamental frequency in the audio stream, 
wherein detecting the changes of the fundamental frequency includes providing a 
threshold value for estimates of the fundamental frequency's voicedness and determining 
whether the voicedness of the fundamental frequency estimates are higher or lower than 
the threshold value, and wherein the voicedness of the fundamental frequency estimates 
lower than the threshold value equals no voice, and wherein the voicedness of the 
fundamental frequency estimates higher than the threshold value equals voice; 

means for determining candidate boundaries for the semantic or syntactic units 
depending on the detected changes of the fundamental frequency; 

means for extracting a plurality of prosodic features in an environment of the 
audio stream where the voicedness of the fundamental frequency estimates are lower than 
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the threshold value, wherein the environment is a period of time between 500 and 4000 
milliseconds preceding and following the candidate boundaries: 

means for extracting tmd combining [[a]] the plurality of prosodic features in-a 
neighborhood of th e oandidato boundoriw ; and 

means for deteraiiiiing boundaries for the semantic or syntactic units depending 
only on the combined plurality of prosodic features. 

13. (Original): An audio processing system according to claim 12, further comprising 
means for generating an index function for the voicedness of the fundamental frequency 
having a value = 0 if the voicedness of the fundamental frequency is lower than a 
predetermined threshold value and having a value = 1 if the voicedness fundamental 
frequency is higher than the threshold value. 

14. (Original): Audio processing system according to claim 12 or 13, further 
comprising means for detecting speech and non-speech segments in the digitized audio 
stream, particularly for detecting and analyzing the signal energy or signal energy 
changes, respectively, in the audio stream. 
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